AITopics | low-rank compression

Collaborating Authors

low-rank compression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition

Neural Information Processing SystemsApr-25-2026, 06:02:32 GMT

We present a novel global compression framework for deep neural networks that automatically analyzes each layer to identify the optimal per-layer compression ratio, while simultaneously achieving the desired overall compression.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Genre: Research Report (0.68)

Industry: Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

2adcfc3929e7c03fac3100d3ad51da26-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 00:22:56 GMT

compression, international conference, neural network, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.68)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees

Chen, Chuyan, He, Yutong, Li, Pengrui, Jia, Weichen, Yuan, Kun

arXiv.org Artificial IntelligenceOct-21-2025

Abstract--Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized approaches project gradients onto randomly chosen subspaces, introducing high variance and degrading empirical performance; greedy methods select the most informative subspaces, achieving strong empirical results but lacking convergence guarantees. T o address this gap, we propose GreedyLore--the first Greedy L ow-R ank gradie nt compression algorithm for distributed learning with rigorous convergence guarantees. GreedyLore incorporates error feedback to correct the bias introduced by greedy compression and introduces a semi-lazy subspace update that ensures the compression operator remains contractive throughout all iterations. With these techniques, we prove that GreedyLore achieves a convergence rate of O(σ/ NT+1/T) under standard optimizers such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression. Extensive experiments are conducted to validate our theoretical findings. Index T erms--Distributed Learning, Low-Rank Compression, Communication-Efficient Optimization, Error Feedback. ISTRIBUTED optimization is a promising paradigm for addressing large-scale problems in signal processing and machine learning. In distributed algorithms, each computing node processes its local data while collaborating with others to minimize a global loss function. This approach mitigates the computational burden on individual nodes, reduces memory requirements, and enables efficient parallel computation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.08784

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Appendix for: Data-Aware Low-Rank Compression for Large NLP Models A Proof of Theorem 1 Theorem 1

Neural Information Processing SystemsAug-18-2025, 22:00:53 GMT

In addition, a pre-defined search grid is also necessary. With these input parameters, we firstly distribute the total allowed loss into each individual module. First, it's indeed a trade-off between the efficiency and efficacy as the speedup ratio goes higher at the cost of lower Thus, in the real application, users need to decide what's the best We could have chose another cutoff like 1 % accuracy with lower speedup ratio to report, but this won't help too much when comparing different baseline methods. D.1 LSTM result A 2-layer LSTM model is composed of two large matrices layers and one large softmax layer. Thus, despite the matrix is much smaller and well approximated by DRONE, the overall acceleration on GPU is less.

artificial intelligence, inference time, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Low-Rank Compression for IMC Arrays

Jeon, Kang Eun, Rhe, Johnny, Ko, Jong Hwan

arXiv.org Artificial IntelligenceFeb-10-2025

In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads. To circumvent these drawbacks, we propose leveraging low-rank compression techniques, which, unlike pruning, streamline the dataflow and seamlessly integrate with IMC architectures. However, low-rank compression presents its own set of challenges, namely i) suboptimal IMC array utilization and ii) compromised accuracy. To address these issues, we introduce a novel approach i) employing shift and duplicate kernel (SDK) mapping technique, which exploits idle IMC columns for parallel processing, and ii) group low-rank convolution, which mitigates the information imbalance in the decomposed matrices. Our experimental results demonstrate that our proposed method achieves up to 2.5x speedup or +20.9% accuracy boost over existing pruning techniques.

artificial intelligence, imc array, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.0782

Country: Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks

Zhang, Shihao, Saab, Rayan

arXiv.org Artificial IntelligenceFeb-4-2025

Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2502.02766

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Powerful Design of Small Vision Transformer on CIFAR10

Wu, Gent

arXiv.org Artificial IntelligenceJan-6-2025

Vision Transformers (ViTs) have demonstrated remarkable success on large-scale datasets, but their performance on smaller datasets often falls short of convolutional neural networks (CNNs). This paper explores the design and optimization of Tiny ViTs for small datasets, using CIFAR-10 as a benchmark. We systematically evaluate the impact of data augmentation, patch token initialization, low-rank compression, and multi-class token strategies on model performance. Our experiments reveal that low-rank compression of queries in Multi-Head Latent Attention (MLA) incurs minimal performance loss, indicating redundancy in ViTs. Additionally, introducing multiple CLS tokens improves global representation capacity, boosting accuracy. These findings provide a comprehensive framework for optimizing Tiny ViTs, offering practical insights for efficient and effective designs. Code is available at https://github.com/erow/PoorViTs.

accuracy, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.0622

Country:

Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Ji, Yixin, Xiang, Yang, Li, Juntao, Chen, Wei, Liu, Zhongyi, Chen, Kehai, Zhang, Min

arXiv.org Artificial IntelligenceMay-17-2024

In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.

bolaco, compression, low-rank compression, (15 more...)

arXiv.org Artificial Intelligence

2405.10616

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks

Xiao, Jinqi, Zhang, Chengming, Gong, Yu, Yin, Miao, Sui, Yang, Xiang, Lizhi, Tao, Dingwen, Yuan, Bo

arXiv.org Artificial IntelligenceFeb-1-2023

Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed, they suffer from costly manual trials or unsatisfied compression performance. In addition, all of the existing works are not designed in a hardware-aware way, limiting the practical performance of the compressed models on real-world hardware platforms. To address these challenges, in this paper we propose HALOC, a hardware-aware automatic low-rank compression framework. By interpreting automatic rank selection from an architecture search perspective, we develop an end-to-end solution to determine the suitable layer-wise ranks in a differentiable and hardware-aware way. We further propose design principles and mitigation strategy to efficiently explore the rank space and reduce the potential interference problem. Experimental results on different datasets and hardware platforms demonstrate the effectiveness of our proposed approach. On CIFAR-10 dataset, HALOC enables 0.07% and 0.38% accuracy increase over the uncompressed ResNet-20 and VGG-16 models with 72.20% and 86.44% fewer FLOPs, respectively. On ImageNet dataset, HALOC achieves 0.9% higher top-1 accuracy than the original ResNet-18 model with 66.16% fewer FLOPs. HALOC also shows 0.66% higher top-1 accuracy increase than the state-of-the-art automatic low-rank compression solution with fewer computational and memory costs. In addition, HALOC demonstrates the practical speedups on different hardware platforms, verified by the measurement results on desktop GPU, embedded GPU and ASIC accelerator.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.09422

Country:

North America > United States > Washington (0.04)
North America > United States > Indiana (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank

Eo, Moonjung, Kang, Suhyun, Rhee, Wonjong

arXiv.org Artificial IntelligenceNov-30-2021

Compression has emerged as one of the essential deep learning research topics, especially for the edge devices that have limited computation power and storage capacity. Among the main compression techniques, low-rank compression via matrix factorization has been known to have two problems. First, an extensive tuning is required. Second, the resulting compression performance is typically not impressive. In this work, we propose a low-rank compression method that utilizes a modified beam-search for an automatic rank selection and a modified stable rank for a compression-friendly training. The resulting BSR (Beam-search and Stable Rank) algorithm requires only a single hyperparameter to be tuned for the desired compression ratio. The performance of BSR in terms of accuracy and compression ratio trade-off curve turns out to be superior to the previously known low-rank compression methods. Furthermore, BSR can perform on par with or better than the state-of-the-art structured pruning methods. As with pruning, BSR can be easily combined with quantization for an additional compression.

compression, low-rank compression, neural network, (15 more...)

arXiv.org Artificial Intelligence

2111.15179

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback